Zipf and non-Zipf Laws 
for Homogeneous Markov Chain 
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Abstract 

Consider an arbitrary homogeneous Markov chain 
with discrete time and with a finite set of states 
Eo, . . . ,E n , where the state Eo is absorbing (the 
"space" ) and Eh.,..., E n are nonrecurrent ( "let- 
ters"). Any trajectory of such Markov chain (a 
"word") ends with the state Eo, the sum of prob- 
abilities of all words equals one. 

As a rule, the number of all possible words is 
infinite, and we are interested in the asymptotic 
behavior of the rate of decrease of probabilities of 
trajectories in the sorted frequency list. We prove 
that in a typical case the asymptotics has a power 
order and determine it by the transition proba- 
bility matrix. If the latter is block-diagonal, then 
with certain specific values of transition probabili- 
ties, the power order of the asymptotics gets some 
corrections. But if this matrix is rather sparse, 
then probabilities quickly decrease, namely, the 
asymptotics is (sub)exponential. Let us now es- 
tablish necessary and sufficient conditions for the 
exponential decreasing order and obtain a formula 
for the exponent, using the transition probability 
matrix and the initial distribution vector. 

Index Terms: Time-homogeneous Markov 
chain, finite state space, monkeys typing ran- 
domly, rank-frequency distribution, power laws. 
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1 Introduction 

In recent time, in applications there has aroused the 
interest of the nature of power laws and their applica- 
bility domains Q~]-[3]- For real-life networks one has 
proposed several models describing the occurrence of 
the power law; the most known one is the preferen- 
tial attachment model [3]. In linguistics, mechanisms 
of the occurrence of Zipf and Heaps laws were thor- 
oughly studied in the time of B. Mandelbrot [5], [5J. 
Papers containing empirical studies and mathemati- 
cal models appear regularly nowadays (see, for exam- 
ple, [7] and references therein; for the mathematical 
motivation of this paper see [5]). However, there are 
no commonly accepted explanations of the fact that 
in reality with some values of parameters the power 
law does not adequately describe the considered pro- 
cess [3] . Here we try to answer this question, consid- 
ering the probabilities of the occurrence of different 
trajectories in a homogeneous Markov chain. 

Our model has occurred in the study of a huge data 
set of Google Books [5]. It appears [TU] that, in spite 
of the pre-computer era of occurrence, the classic Zipf 
law with the exponent remarkably agrees with 
frequencies of several hundreds of English word forms 
most commonly occurring in modern books published 
in one year that are represented in the Google Books 
database. But considering a collection of a hundred 
of thousands of words, we see that more adequate is 
the Mandelbrot modification, where the parameter of 
the asymptotics of the power law slightly differs from 
one. Note that in the hieroglyphic script the power 
law is irrelevant [7] . Certain auxiliary considerations 
in the classification of retrieval requests with respect 
to the structure of the use of various parts of speech 
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show that the power law adequately describes middle 
length queries; in a general case, the situation is more 
difficult. 

For the initial model explaining the power law 
of the decrease of occurrence frequencies of English 
words we consider the model of the word generation 
process consisting in the sequential independent ran- 
dom appending of various symbols (letters and the 
space), each of which has a fixed probability (the 
monkey model). This model has a long history, but 
the power order of the asymptotics of the sorted list 
of word frequencies has been proved for it only re- 
cently [g], [ib]. 

In this paper we study one natural generalization 
of this model, namely, the model with the Markov 
connection of neighboring symbols. Such model was 
studied by B. Mandelbrot [6]; however, he has mainly 
considered a particular case of the occurrence of the 
power law. It appears that the asymptotics of the or- 
dered list of frequencies of various trajectories of the 
Markov process (word probabilities) essentially de- 
pends on the transition probability matrix. There ex- 
ists an analogy with known limit theorems for Markov 
chains [H]-[I3]- 

Thus, we consider a homogeneous Markov chain 
with discrete time and with a finite set of states 
Eq , . . . , E n such that 

states E\, ... , E n are nonrecurrent, (1) 

hence the state E is absorbing (see [TJ, [T3] for the 
terminology and equivalent statements given below). 
The goal of this work is to study frequencies of trajec- 
tories in this chain, i.e., "words" composed of sym- 
bols Ei,...,E n ending with the "space" Eq. As a 
rule, the number of all possible words is infinite, and 
we are interested in the asymptotics of the rate of the 
decrease of probabilities of trajectories in the sorted 
frequency list. We prove that in a typical case the 
asymptotics has a power order and find the exponent 
with the help of the transition probability matrix. If 
the latter is block-diagonal, then with some specific 
values of transition probabilities, the power asymp- 
totics gets (logarithmic) addends. But if this matrix 
is rather sparse, then probabilities quickly decrease, 
namely, the asymptotics is (sub)exponential. We also 



establish necessary and sufficient conditions for the 
exponential order of decrease and obtain a formula 
for the exponent using the transition probability ma- 
trix and the initial distribution vector. 

2 The exact statement of main 
result 

Let Pq be a (stochastic) transition probability ma- 
trix of the Markov chain mentioned in the latter 
paragraph and let P be its (substochastic) subma- 
trix corresponding to states Ex, ■ ■ ■ ,E n . Denote by 
Go the oriented pseudograph with the set of vertices 
{0, . . . , n}, whose arcs (i,j) are defined by inequali- 
ties pij > 0. Conditions ([IJ are equivalent to the fact 
that the graph Go is (weakly) connected, and {0} is 
a unique collection of vertices that has no arcs to its 
complement. Let G be the subgraph of the graph Go 
with the set of vertices {1, . . . ,n} including all arcs 
of the initial graph Go between these vertices (the 
subgraph generated by vertices {1, . . . , n}). Let H be 
a subgraph of the graph Go generated by some set 
of vertices, then we denote by Ph the correspond- 
ing submatrix of the matrix Pq'.Ph = (Pij)i.jev(H)- 
Thus, for example, P = Pq- In addition, we set 

Recall that a strongly connected component is the 
maximal complete subgraph such that any pair of 
its vertices is mutually connected. Denote by G' 
the (acyclic) digraph obtained from the graph Go 
by identifying vertices and arcs that belong to 
all strongly connected components of the initial 
graph Go (in |15j this graph is called the conden- 
sation). In this paper the graph G' is connected and 
is the only vertex having no outgoing arcs. 

We denote by a — (ao, . . . , a n ) the initial distribu- 
tion of probabilities on the state set. Without loss of 
generality we assume that 

there are no states such that 
the probability of attaining them (2) 
equals zero at any time moment. 

In what follows we sometimes deal with initial dis- 
tributions for which condition ([2]) is not stated; we 
mention all such cases specially. 
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We associate an arbitrary path c — (ii,...,i m ) 
in the graph Go with the weight Pr(c) = 
Piii 2 ■ ■ -Pim-iim- Instead of a path in the graph, it 
is often more convenient to consider an ordered set 
of states of the chain w — (E^ , . . . , Ei m ) . We call 
this set a word, if > 0, Ei m = Eq, Ei m _ 1 ^ Eq. 
In other words, a word is a sequence of states of the 
system from the start of the walk till the absorption 
by the state Eq. We determine the word probabil- 
ity Pr(iy), taking into account the initial distribution: 

Pr(io) = a i:L pi li2 . . ■p lm _ 1 i m - (3) 

One can easily prove that the set of all words with the 
measure Pr forms a discrete probability space (i.e., 
the sum of probabilities of all words equals one). 

We understand the length L of a word w as the 
number of states in it, excluding the last absorbing 
state Eq. We also denote by G the set of all simple 
cycles of the graph G. 

Let us sort all words in the nonincreasing order of 
their probabilities. Evidently, both the value pit) = 
Pr(wt) (the probability of the tth word in this ordered 
list) and the "inverse" to it function Q(q), q G (0, 1], 
(that equals the number of words whose probability 
is less than q) are defined. We are interested in the 
asymptotics of the function pit) with t — > oo (or, 
equivalently, that of the function Qiq) with q — > 0). 

We use the standard O-symbolics, namely, we de- 
note by O the asymptotic order and we do by f2 the 
lower estimate of the order (|16( Section 9.2]). 

Theorem 1 Three cases are possible: 

1. If the graph G is acyclic, then the function pit) 
is finitary (i.e., the number of all possible words 
is finite). 

2. If the graph G contains a vertex which is com- 
mon for two different simple cycles, then pit) = 
Slit- 1 " 3 ), where (3 is a real number, with which 
the maximal modulo eigenvalue of the matrix 
Pg(0) equals one. Note that such j3 exists, it is 
unique and belongs to the interval (0,1). More- 
over, pit) = oitT 1 !^ ) for any j3' > (3. In ad- 
dition, the exact power order (i.e., the equality 
pit) = 0(£~ 1 /' 3 )J is attained if and only if any 




Figure 1: Examples of graphs Go of a Markov chain 
with three states Eq,Ei, Ei (the vertex that corre- 
sponds to the absorbing state Eq is pictured at the 
bottom). In the first case the function pit) is fini- 
tary, in the second one it has a power asymptotics, in 
the third one the asymptotics is subexponential, and 
in the fourth and fifth cases it is exponential. Note 
that the classification depends only of the graph G 
(the upper part of the figure), if states E\,Ei are 
nonrecurrent. 




Figure 2: An example of the graph Go of a Markov 
chain with five states Eq , E\ , E^ , E3 , £4 (the vertex 
that corresponds to the absorbing state Eq is at the 
left). The function p(t) is bounded by two functions 
with the power asymptotics, however, their power ex- 
ponents do not coincide, therefore the function p(t) 
itself does not necessarily have a power asymptotics 
(this depends on concrete values of transition prob- 
abilities; see the discussion of this example after the 
statement of the theorem). 
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simple path in the graph G contains at most one 
vertex ( a strongly connected component H of the 
graph G) such that the matrix Pu(/3) has the unit 
eigenvalue. 

3. If the graph G contains cycles, and each vertex 
of the graph G belongs to no more than one sim- 
ple cycle, then pit) = f2(a*) and p(t) = o(t~ x ), 
where X is any positive value, while a € (0, 1) is 
some constant depending on the matrix P. Ad- 
ditionally, p(t) — O (exp (— ft)) for some 7 > 
if and only if any path in the graph G contains 
vertices of no more than one cycle. In this case 
p(t) = 0(exp (— 7't)); here 7' is defined by the 

formula 1/7' = — X) c ec &(c)/lnP r(c) and k(c) 
is the number of various words-paths in Go with 
nonrepeating states going through some vertices 
of the cycle c. 

Remark 1. The first item of the Theorem is trivial 
(we consider it only for the sake of completeness). It 
follows from the fact that in an acyclic graph the 
length of any word does not exceed n. 

Examples: The graph in the second diagram in 
Fig. 1 has a unique strongly connected component 
with vertices {1,2} (we do not consider the trivial 
cycle from the absorbing state to itself). This compo- 
nent contains cycles (1, 2, 1) and (1,1), therefore the 
function pit) has a power asymptotics. If all transi- 
tion probabilities from states Ei,E 2 equal 1/2, then 
one can easily calculate that f3 = \g(f>, where cf> = 
(1 + v5) /2 and lg is the binary logarithm. The graph 
in Fig. 2 has two strongly connected components Hi 
and H2 (we do not consider the trivial cycle from the 
absorbing state to itself), and both of them belong 
to one path in the graph G'. Each of these compo- 
nents consists of two loops going out of two vertices 
and one more cycle of length two connecting these 
vertices. Therefore, conditions of Theorem [TJ2 are 
fulfilled. However, if all transitions probabilities from 
states Ei, E2, E3, E4 equal 1/3, then one can easily 
calculate that j3 = 1 / lg 3 . With such value of j3 ma- 

trices P Hl ((3) = ^ }/2 ) = Ph *W have the 

unit eigenvalue. Therefore, the power asymptotics 



does not take place, namely, p(t) — n(t lg3 ) and 
p{t) = o(t- 5 ) for any 6 < lg3, but p(t) ^ 9(^ lg3 ). 

The graph in the third diagram in Fig. 1 has two 
simple cycles-loops, and the graph G contains a path 
going through all vertices, therefore the asymptotics 
is subexponential but not exponential. The graph in 
the fourth diagram in Fig. 1 has two analogous cycles, 
but the graph G does not contain the path mentioned 
in the previous example; consequently, the asymp- 
totics of the function p{t) has the exponential order 
of decrease. Note that k(c) = 1 for each of the cycles. 
The graph in the fifth diagram has one simple cycle, 
and the order of the asymptotics is also exponential. 
If ai > and 02 > 0, then we have k(c) = 4, the four 
desired words with nonrepeating states are (Ei,Eq), 
(E 2 ,E ), (E U E 2 ,E ), and (E 2 ,E 1 ,E ). Now if for 
Markov chains with graphs depicted in the fourth 
and fifth diagrams all transition probabilities from 
states Ei,E 2 equal 1/2, then one can easily calculate 
that in both cases 7' = In \/2. 

Remark 2. As was proved earlier [5], [TO], if states 
are chosen independently and the probability of each 
one is i = 0, . . . ,n, then for n > 1 the function 
pit) has a power asymptotics; its exponent deter- 
mined from the equation J^LiPf = 1 equals 1//?. 
This is a particular case of Theorem [TJ2, where the 
matrix P consists of nonzero elements and has equal 
rows. Raising all elements of the matrix P to the 
power /3, we obtain a stochastic matrix; it is well 
known that the maximal eigenvalue of the stochastic 
matrix equals one. 



3 Spectral properties of sub- 
stochastic matrices 

Prior to proving Theorem [TJ 2, let us prove the unique 
existence of the exponent /3 in this case. Consider an 
arbitrary (substochastic) matrix P — (py)?3=i w i tn 
the following properties (in conditions given below 
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indices i,j belong to {1, . . . , n}): 

< Pij < 1 for all i, j; 
'YTj=\Vij < 1 f° r au * (the substochasticity); 

the matrix P is not nilpotent; 
for any principle submatrix of the matrix P 
there exists a row such that the sum of its elements 
in this submatrix is strictly less than one. 

(4) 

Note that the latter property is equivalent to the 
nonrecurrence of all states (except the absorbing 
one) [H]. 

Recall that for matrices with nonnegative elements 
(nonnegative matrices) the next theorem [171 The- 
orem 3, Chapter XIII] is valid. Namely, "A non- 
negative matrix A — (ajj)",- =1 always has a non- 
negative characteristic value r such that moduli of 
all characteristic values of A do not exceed r. To 
this maximal characteristic value r there corresponds 
a non-negative characteristic vector Ay = ry (y > 0, 
y jt= 0)." Note that both the matrix A and that A 1 
(the symbol t is the transposition sign) may have no 
positive eigenvector (a vector all whose components 
are strictly positive). Later we discuss the existence 
condition for such a vector. 

Recall that the symbol P(j3) denotes the matrix 

(p«)?j=i ( here 0/3 = for an y P)- 

Lemma 1 For any matrix P in form O) there exists 
unique f3 € R such that the maximal characteristic 
value of the matrix P(/3) equals 1, while < /3 < 1. 
The inequality f3 > is equivalent to the existence 
in the graph G of two different simple cycles that go 
through one and the same vertex. 

Proof: Denote by Si the sum Y^j=iPij- Let s = 
mini Si and S = maxj s$ . It is known that [171 Note 
on p. 68] the maximal characteristic value r of any 
nonnegative matrix satisfies the inequality s < r < S. 
Denote by r(/3) the maximal eigenvalue of the ma- 
trix P(f3) and set s(/3) = min, s,-(P(jS)) and S(f3) = 
maxi Si(P(j3)). 

Let us prove the uniqueness of the choice of (3 and 
the validity of the inequality < j3 < 1. Recall that 
the matrix P is called indecomposable if the oriented 
graph G is strongly connected. It is known that [TTJ 



p. 63] indecomposable nonnegative matrices with un- 
equal values of s and S satisfy the strict inequality 
s < r < S. In a general case, the decomposition 
of a graph into strongly connected components cor- 
responds to the normal form of the matrix obtained 
from the initial one by renumbering its rows (and, 
correspondingly columns) . The diagonal of the nor- 
mal form (see [TTJ, p. 75]) is occupied by square blocks 
that correspond to numbers of vertices that belong 
to one and the same strongly connected component; 
the matrix elements located above these blocks equal 
zero. Therefore, sequentially decomposing the deter- 
minant by a group of rows that correspond to strongly 
connected components, we obtain that the charac- 
teristic polynomial of the matrix equals the prod- 
uct of characteristic polynomials of each of diagonal 
blocks, r(f3) coincides with the maximal eigenvalue 
of blocks. However, according to formula ((4]), for the 
square submatrices that correspond to each of these 
blocks, the value s is strictly less than one. In addi- 
tion, not all blocks are zero, otherwise the matrix P 
is nilpotent and s(0) > 1 for at least one of blocks. 
Consequently, r(l) < 1, r(0) > 1. 

Evidently, p^ decreases as (3 increases, if > 0. It 
is known that [T71 Theorem 6, Chapter XIII] if some 
elements of a nonnegative indecomposable matrix de- 
crease, then its maximal characteristic value strictly 
decreases. Therefore r(/3) is a decreasing function. 
We have proved the uniqueness of the choice of {3 
and the validity of the inequality < (3 < 1. 

Let us prove the last assertion of the lemma. In 
the normal form of the matrix P we consider the 
block containng the vertex that belongs to two dif- 
ferent cycles. For this block we introduce analogs 
of values s(j3) and S((3); we denote them by s'{(3) 
and S'(j3), correspondingly. The considered block, 
by definition, is an indecomposable matrix. Conse- 
quently, s'(0) > 1 and S'(0) > 2. Hence for the 
matrix P(0) we get r(0) > 1, which implies that in 
this case the desired value of (3 (by condition of the 
lemma) is strictly positive. 

It remains to prove that if no vertex of the graph G 
belongs to two cycles, then the desired value of (3 
equals zero. Really, the considered diagonal blocks 
are either trivial (i.e., consisting of one elements) 
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or correspond to nontrivial strongly connected com- 
ponents of the graph G. A nontrivial component, 
by definition, contains a cycle going through all 
its vertices. In our case this cycle cannot be self- 
intersecting, because in this case there exists a ver- 
tex belonging to two cycles. For the same rea- 
son, there are no arcs, except those of the consid- 
ered (simple) cycle in the strongly connected com- 
ponent. But this means that for the corresponding 
block S'(0) — s'(0) = 1. Since the characteristic 
polynomial of the matrix P(0) represents the prod- 
uct of characteristic polynomials of diagonal blocks, 
we obtain r(0) = 1. □ 

Corollary 1 Assume that under conditions of 
Lemma [3 j3 > and the normal form contains 
several blocks representing strongly connected compo- 
nents H of the graph G such that characteristic num- 
bers of matrices Ph (/?) equal one. Then each of these 
graphs H contains two different simple cycles going 
through one and the same vertex. 

Evidently, Lemma [TJ taking into account the non- 
recurrence of states of the Markov chain implies the 
existence of the exponent /? in the interval (0, 1), pro- 
vided that conditions of Theorem [TJ 2 are fulfilled. 

Let us now consider the case when the ma- 
trix P(/3)' has a positive eigenvector corresponding 
to the unit eigenvalue. Redefining the standard nec- 
essary and sufficient conditions for the existence of 
a positive eigenvector (see [HI theorem 7, Chap- 
ter XIII]), we obtain the following assertion: 

Proposition 1 Let assumptions of Lemma[l] be ful- 
filled and (3 > 0. The matrix P(/3)' has a positive 
eigenvector corresponding to the unit eigenvalue if 
and only if in the graph G' vertices without incom- 
ing arcs, and only they, correspond to strongly con- 
nected components H for which matrices Ph{P) have 
the unit characteristic value. 

Evidently, there exists a path from the set of all 
such vertices to any vertex of the graph G' . Since 
each of graphs H mentioned in Corollary [JJ has a ver- 
tex with at least two incoming arcs, we obtain Corol- 
lary d 



Corollary 2 Assume that under conditions of The- 
orem\^2 the matrix P(/3) has a positive eigenvec- 
tor corresponding to the unit eigenvalue. Then we 
can choose a vector a — (at, . . . , a n ) satisfying con- 
dition 0) such that ak = for all vertices with less 
than two incoming arcs. 

Note that Proposition [1] implies that under con- 
ditions of Corollary [2] the graph G has no vertices 
without incoming arcs. 

4 The power law in the case 
of the existence of a positive 
eigenvector 

We need some more auxiliary assertions which are 
valid in the case of power inequalities for the func- 
tion p(t). 

Lemma 2 Let S > 0. 

A. With some initial distribution a (not necessar- 
ily satisfying condition (fl))) we obtain p a (t) — Sl(t~ s ) 
(hereinafter the subscript indicates the initial distri- 
bution). Then with any initial distribution a' satisfy- 
ing condition (0) we have p a '{t) = Q(t~ s ). 

B. Assume that with some initial distribution a, 
a = (ai,...,a„), satisfying condition (0) it holds 
p a (t) = 0(t~ s ). Then with any initial distribution a' 
we have p a '(t) = 0{t^ 5 ). 

If the order of a distribution is not power, then the 
assertion analogous to Lemma [H generally speaking, 
is not true. Namely, the order of the asymptotics of 
the function p{t), possibly, depends on the initial dis- 
tribution. Thus, when calculating the Markov chain 
that corresponds to the last (fifth) diagram in Fig. 1, 
we obtain the exponential order of the asymptotics of 
the function p(t) with the exponent 7' = In \pl. Here 
we do not assume that a\ > 0,02 > 0. But if in this 
chain a — (1,0), then, as one can easily prove, the 
asymptotics is exponential with 7' = In 2. 

Note that our assertion is equivalent to an anal- 
ogous one for the inverse (more exactly, quasiin- 
verse, see [18]) function Q(q) with 1/5 in place of 
8. Really, according to the graph of the function 
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p(t), the inequality p(t) < ct~ s (p(t) > ct~ s ) that 
takes place with all i > 1 is equivalent to that 
Q(q) < (g/c) -1 / 5 = const q~ x / 6 (or, correspondingly, 
Q(q) > const q^ 1 / 5 ) with all (sufficiently small) val- 
ues of q. 

Proof of Lemma [2}A: Let 1(a) = {i : a t > 0}, 
E(a) = {Ei : cij > 0}. By condition, in a Markov 
chain with the initial distribution a' (MCh a ') there 
exist words containing states E(a). Therefore, for 
each j, j £ 1(a), there exists a path 
such that i' e I(a') (we denote this path by n(j)). 
We associate each word w in the MCh a that starts 
with Ej with a word w' in MCh a / by adding the prefix 
(E z ,,E n ,. ..,Ej). Evidently, Pr /(u/) = Pr a (w)c(j), 
where c(j) — Pr(7r(j))a-,/aj. It is possible that sev- 
eral words in the MCh a correspond to one and the 
same word in the MCh a /. However, this word can 
appear in the associated list no more than n times. 

Consider the sorted list of first t 
words (uii, u>2, • ■ • , u>t) in the MCh a and asso- 
ciate them with words (w[, . . . ,w' t ) in the MCh a < 
(some of them, possibly, coincide). We get p a '(t) > 
Pr a 'Wt) > Pa(nt) min je/(a ) c(j) > const t^ 1 / 5 . □ 

Interchanging a and a', we obtain the inequality 
Pa'(t) < cp a (\t/n}), which gives the assertion of 
Lemma [21 B. 

Let us now prove the key lemma including an im- 
portant particular case of Theorem [TJ2. 

Lemma 3 Assume that a graph G has a vertex that 
belongs to two different simple cycles, j3 is chosen 
in accordance with Lemma [7J and the matrix P(/3) 
has a positive eigenvector e corresponding to the unit 
eigenvalue. Then p(f) = G^ -1 /* 3 ). 

Proof (cf. with the proof in [10]): As was noted 
earlier (before the proof of Lemma HJ, the asser- 
tion on the power asymptotics of the function p(t) 
is equivalent to an analogous assertion for the func- 
tion Q. Let us prove it now. 

For convenience we introduce an empty word and 
assume that its rank equals one. Denote by Q'(x) 
the number of words whose probability (after the 
mentioned redefinition) is not less than x. We have 
Q'(x) — Q(x) + 1 for all x < 1, in particular, Q'(l) = 
1 . Evidently, the assertion on the exponential asymp- 



totics of the function Q(q) with q — > is equivalent 
to the same assertion for the function Q' . 

We understand an incomplete word as the initial 
part of a word (a path) (ij, . . . , i m ) such that > 0; 
we define the "probability" of an incomplete word by 
the same formula ([3]). For positive x we introduce 
functions Qk(x) which equal the number of incom- 
plete words such that they end with the symbol Ek 
and their "probability" is not less than x. Evidently, 
Qk(x) = with x > 1. We treat the empty word as 
incomplete and set 



Qo(x) 



1 with x < 1, 
with x > 1. 



We need functions Q' k (x): Q' k (x) = Qk(x) + 1 for all 
x < 1, k = 1, . . . , n. 

The definition implies the following important re- 
current correlation: 

Qk(x) = E m:Pmfc >o Qm(x/p m k) + Xk(x), 



where Xk(x) 



)o(x/a k ), a k > 0, 



0, otherwise. 
In particular, the next inequality is valid: 

Qk(x)> ^ Qm(x/p m k), k = l,...,n. (5) 

m:p mk >0 

Choosing a k in accordance with Corollary [2] we ob- 
tain the inequality 

Q'k(x) < Q' m (x/P,nk), k = l,...,n. (6) 

m:p m k>0 

Let the vector e mentioned in the condition of the 
lemma have components (ei, . . . , e n ). One can eas- 
ily make sure that functions fk(x) = e k x~P, k 
1 ..... /;. satisfy the following set of functional equa- 
tions: 

fk(x)= fm(x/p m k), k = l,...,n. (7) 

m:p m k>0 

Now let M be the minimal positive element of 
the matrix P. Taking into account the definition of 
functions Qk(x) and Q' k (x), and the fact that they 
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are piecewise constant, we conclude that there ex- 
ists a segment in the form [My, y] such that with 
My < x < y the following inequalities are valid: 
Q k (x) > cifk{x); Q' k {x) < c 2 fk(x), k = l,...,n, 
where ci,C2 are some positive constants. But then 
formulas (JSJ |H [7]) give the same inequalities (with 
the same constants c\ and C2) for all x < y. Since 
Q( x ) = Y,m: Pm0 >oQrn{x/p m o)i we conclude that 
with sufficiently small x it holds Q(x) = <d(x~@). 
□ 

Corollary 3 Let assumptions of Theorem{^2 be ful- 
filled. Then p(t) = where (3 is a real num- 
ber such that the maximal modulo eigenvalue of the 
matrix PgW) equals one. 

Proof: Really, consider (3 > in the condition of 
Lemma[T] The normal form of the matrix Pg{P) con- 
tains blocks that represent strongly connected com- 
ponents H such that the maximal modulo eigen- 
value of the matrix Ph{(3) equals one. Deleting 
strongly connected components from the graph G 
(deleting vertices of the graph G") we can make the 
obtained graph G (its condensation G') satisfy condi- 
tions of Proposition [TJ The initial distribution a for 
the Markov chain with the graph G corresponds to 
some distribution a (not necessarily satisfying con- 
dition @) on the graph G. Applying the proved 
Lemma[3]and using Lemma[2jA, we obtain the asser- 
tion of Corollary [3] □ 

Corollary 4 Let conditions of Theorem [TJ 2 be ful- 
filled. Then p(t) = oit- 1 /! 3 ') for any (3' > (3. 

Proof: Let k be the number of vertices in the graph 
G without incoming arcs. We consider a Markov 
chain with ri + 2fc states, whose transition probability 
matrix P is obtained from the matrix P by append- 
ing k pairs of rows. Each pair contains only one ele- 
ment outside the diagonal 2x2 block, it corresponds 
to the transition to its state, which was unattain- 
able earlier. The subgraph (the strongly connected 
component) that corresponds to the block consisting 
of two vertices takes the form depicted in the upper 
part of the second diagram in Fig. 1. Therefore, the 

if s \ 

mentioned 2x2 block takes the form P2 = [ , n , 



where < r, s, t < 1, r+s = 1, numbers r, s, t are the 
same for all blocks. Let us choose numbers r, s, t so as 
to make the maximal eigenvalue of the matrix Pzffl") 
equal one (for some f3": (3 < f3" < (3'). To this end, 
it suffices to choose x such that r 13 x + s 13 = 1 (since 
r^ +S 13 > 1, the desired value of x is less than one), 
and then to set t = x 1 ^ . 

Let us now consider the Markov chain with the 
transition probability matrix (between non-absorbing 
states) P. Evidently, the matrix P(f3") satisfies con- 
ditions of Proposition [TJ whence by Lemma [3] and 
Lemma [2]B for the initial Markov chain we obtain 
p{t) = 0(t~@ ), which was to be proved. □ 

5 Completion of the proof of 
Theorem [B2 

It remains to establish necessary and sufficient con- 
ditions for the power asymptotics. Sufficient but 
not necessary conditions are given by assumptions 
of Lemma [3] In order to complete the proof of Theo- 
rem [TJ 2 with the help of Lemma [3] we need two more 
auxiliary assertions. 

Lemma 4 Assume that Markov chains with 
graphs G\ and G2 with some initial distributions 
(satisfying condition ^)) for p\(t) and P2{t) (prob- 
abilities of the tth word in the corresponding sorted 
list) satisfy the correlations Pi(t) = 0{t~ bl ) and 
P2{t) — 0{t~ 52 ), where 5x^52 > 0. Assume that for 
the Markov chain with the function p(t) any word 
belongs either to the first Markov chain or to the 
second one. Then with any initial distribution the 
following correlation is valid: 

p(t) = 0(t~ s ), where 6 = min{(5i, S 2 }. (8) 

Proof: By Lemma [5JB it suffices to prove inequal- 
ity (jSJ with some concrete initial distribution a sat- 
isfying condition ([2]). Let us choose it as (a' + a")/2, 
where a' and a" are the initial probability distribu- 
tion in the first and second Markov chains, corre- 
spondingly. The sorted list of first t words of our 
Markov chain contains either a word of the first 
Markov chain or that of the second one with the index 
no more than [~i/2] . 
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By condition there exist positive constants c\ and 
C2 such that 



With 8\ > 62 we have (the reasoning for Si < 82 is 
analogous): 



(9) 



> 



pi(t) < c x t- &1 , p 2 (t) < c 2 t- 52 for all t. 

Let us choose a constant c such that ct~ b 
max{2 Sl at~ Sl ,2 S2 c 2 t~ S2 } for all natural t. We have 
p(t) < max{pi( ft/21 ),P2(r*/ 2 l)} <ct- 5 . □ 



Lemma 5 Assume that Markov chains with 
graphs G\ and G2 satisfy conditions of Lemma [7j 
Assume that the Markov chain with the function 
p(t) corresponds to the graph G that represents the 
union of graphs G\ and G2 with additional arcs 
going from the graph G\ to that G2 so that any 
vertex of the graph G2 is attainable through the path 
consisting of these arcs. Then formula |2P is valid 
with Si ^82- Correlation £5p is false if the initial 
distribution satisfies condition p?)). while Si — 82 and 

Pi(t) = n(t- 5 i), P 2(t) = n(t- S2 ). 



Proof: We denote constants in inequality © by 
ci,C2- Recall that correlations Pi(t) = rt(t~ Sl ) and 
p-2.it) = Q(t~ S2 ) mean that there exist constants 
ci,c' 2 > such that pi(t) > c'it~ Sl and p 2 (t) > 
c' 2 t~ S2 . Let us prove power estimates for the func- 
tion Q(q) rather than for pit). Let us first consider 
the case Si ^82- 

First of all note that without loss of generality we 
assume that any word w of the initial Markov chain is 
representable in the form (wi, W2), where Wi, i = 1,2, 
is a nonempty word of the Markov chain with the 
graph Gi. Really, by Lemma [2jB we can assume that 
the initial distribution is condensed at vertices of the 
graph Gi, therefore the word wi is nonempty. If, 
in addition, there exist some words representing only 
words of the chain with the graph Gi with the con- 
sequent transition to the absorbing state, then with 
the help of Lemma [4] we reduce considerations to the 
considered case. 



Q(q) = \{{hM) ■ Pi{ti)p2{t2) >q}\ < 



< 



{{tiM)-Cit^ 1 C2t 2 - 52 >q} 
{(ti,t 2 ):t S 1 H S 2 2 < (c lC2 )M 



< 



< ET 1= l(<l/(ciC2))- 1/S2 ti 5l/52 = const q 



-1/S2 



In the case Si = 82 
to the inequality 



8 similar considerations lead 



Q(q)> {(ti,t2):tit2<((c' L c' 2 )/q)^ S } 



Here we applied Lemma [2jA and the fact that the 
initial distribution concentrated at vertices of G2 in- 
cident to those of Gi satisfies condition @ for the 
Markov chain with the graph G2 ■ 

By the Dirichlet formula for the divisor function 
the number of points with natural coordinates, whose 
product does not exceed N, equals N\nN + (2j — 
1)N + 0(^/1^), where 7 is the Euler constant. There- 
fore, the inequality Q(q) < const q^ 1 ^ is false with 
small q with any positive constant, which was to be 
proved. □ 

Completion of the proof of Theorem Q32: We 

can rather easily prove the sufficiency of conditions 
for the power asymptotics in Theorem[Tj2 in the case 
when the graph G" represents a simple path. To this 
end we apply the first part of Lemma [5] (evidently, 
we can use the induction with respect to the number 
of vertices in the graph G'). In the case of an arbi- 
trary graph G", all words are classified with respect 
to all possible paths in the graph G', and the use of 
Lemma S] reduces all considerations to the considered 
case. 

Let us prove the necessity of conditions for the 
power asymptotics in Theorem [T] 2. Assume the con- 
trary. Consider a path in the graph G' with ex- 
actly two vertices corresponding to graphs Hi and 
H 2 for which Ph x (/?) and Pjj 2 (/?) have the unit char- 
acteristic value. Evidently, without loss of generality, 
we can assume that our path tt starts at the ver- 
tex that corresponds to the graph Hi, goes through 
the vertex of H2, and ends at a vertex that leads 
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to the absorbing state. Consider the complete sub- 
graph Git of the graph G that includes all vertices 
of strongly connected components of the considered 
path. The graph is representable as the union 
of graphs G\ and G 2 that correspond to vertices of 
the path from Hi to H 2 (non-inclusive) and from H 2 
to a vertex that leads to the absorbing state. In ac- 
cordance with Proposition Q] and Lemma [3] the graph 
Gtt satisfies conditions of the final part of Lemma [5] 
Therefore, inequality © is false. On the other hand, 
by our assumption it is valid for the graph G, and 
by Lemma [2]B it is valid in the case of the initial 
probability distribution a concentrated at vertices of 
the graph H\. However, the sorted probability list 
of the graph G in the case of such initial distribution 
satisfies the evident correlation p a ,G(t) > Pa,G„(t)- 
Therefore, inequality © for the graph G is violated, 
which was to be proved. □ 

6 Proof of Theorem [U.3 

In this case nontrivial strongly connected components 
of the graph G represent the considered cycles, and 
the graph G' is obtained by contraction of these cy- 
cles. Denote by d the cycle argmax cgC Pr(c). Let v 
be one of vertices of this cycle. Let v' be the vertex 
of the graph G' corresponding to the cycle c'. 

By condition ([2]) there exists a word w containing 
the state E v . We set a = maxcgc Pr(c) (note that 
a < 1), ci = Pr(u>); = w, and is the word 
obtained from w^^ 1 ^ by insertion into it of the se- 
quence of states that correspond to the tracing of the 
cycle c'. Evidently, Pr(w^) — C\a l . By definition, 
p(t) > Pt(w^~ 1 ') > cia 1 . The lower exponential 
bound is proved. 

Let us now prove that p(t) decreases faster than 
any power function. Let W be the set of all words 
with nonrepeating states. Evidently, each word w can 
be obtained from some word w' in W by insertion of 
cycles. Some of them are, possibly, repeating, how- 
ever, they have to be subsequent in the considered 
path (since the graph G' is acyclic, it is impossible 
that the path of the graph G first goes through some 
cycle c, then it does through a part that has no com- 
mon vertices with the cycle, and then there appear 



vertices of the same cycle c). The order of nonre- 
peating cycles is defined by the word w' . Note that 
the result of the insertion is independent of the state 
(the letter) after which a fixed cycle c is inserted in 
the word (for example, for the last (fifth) diagram in 
Fig. 1 the insertion of the cycle 1^2 into the word 
(E 1 ,E 2 ,E ) after the "letter" E 2 or after the "letter" 
Ei gives one and the same word (Ei , E 2 , E\ , E2 , Eq ) ) . 

Note that any word, whose length exceeds nt, con- 
tains at least t—1 cycles. Consequently, Proposition^ 
is valid. 

Proposition 2 If L(w) > nt, then Pi(w) < a* . 

If L(w) > n(r + 1), then by Proposition [2] we have 
Pr(w;) < a T . We are interested in the upper bound 
for the number of words 10 such that Pr(w) > a T . In 
order to obtain this bound, suffice it to calculate the 
number of words whose length does not exceed n(r + 
I)- 

Let us prove that under assumptions of the theo- 
rem the number of words, whose length does not ex- 
ceed x, is upper bounded by the value \W'\(x+n) n /n. 
Really, any word-path of length i contains no more 
than i cycles. Evidently, the graph G has no more 
than n different cycles. Since the number of combina- 
tions with repetitions from n by i equals ( n+t ~ ), the 
total number of words of length i is upper bounded 
by the value | W'\ • Summing with respect to i 

from to z, we obtain \W'\(1 + x)(^)/n, which 
gives the desired value. 

The obtained bound implies that the number of 
words, whose length does not exceed n(r + 1), is up- 
per bounded by the value f(r) = {W'ln"- 1 ^ + 2)". 
Comparing this assertion with Proposition[2] we con- 
clude that with t > /(r) the inequality p{t) < a T 
is fulfilled. Therefore, p(t) is upper bounded by a 
subexponential function, which proves the correlation 
p(t)=o(t- x ). 

Let us now prove the necessary and sufficient con- 
ditions for the exponential decrease. Assume that the 
graph G contains a path going through vertices of two 
cycles, namely, first it does through vertices of a cy- 
cle c" and then those of c'" . According to ([2]), there 
exists a word w' containing both these vertices-states 
in the same order. Denote by the word obtained 
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from w' by insertion of states corresponding to r cy- 
cles, each of which is either the cycle c" or that c'". 
Note that there is r + 1 ways to obtain the word 
u>( T ); each way consists in a combination with repeti- 
tions from 2 by r. For any u/ T - ) we have Pr(u>( T )) > 

Pi(w')5 T , where <5 = min{Pr(c"), Pr(c"')}. There- 
fore, with f = 1 + . . . + (t + 1) = (t + l)(r + 2)/2 
we get > const S T > const 5 V ^*. This contradicts 
the correlation p(t) = 0(exp {— jt}) with any 7 > 0. 

Let us now prove the last assertion of the theorem 
about the constant 7'. Consider the set of words with 
non-repeating states W. Now each word w can be 
obtained from some word w' , w' € W, by insertion 
of one and the same cycle (possibly, repeated several 
times). Here a concrete cycle c € C can be inserted 
only in k(c) words. Denote the set of such words 
by K(c). Evidently, Pr(w) = Pr(u/)Pr(c) m , where 
m is the number of cycles c inserted in the word w' € 
K(c). 

Let p' < min^/gvi/' Pr(«/). Let us find the num- 
ber of words Q(p') whose probability exceeds p' . 
Evidently, such are all words in W. The rest 
words are obtained from K(c) by insertion of cycles. 
The number of inserted^ cycles varies from zero to 
L(lnp' - lnPr(u/))/hiPr(c)J. Therefore, the differ- 
ence Q(p') — lnp'^ cgC fc(c)/lnPr(c) is a bounded 
value. The proved boundedness of the difference 
Q(exp{— j'x}) — x with all x > is equivalent to 
the boundedness of the difference t — ln{l / p(t)} / j' , 
which was to be proved. 
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